206
• Negative predictive value (NPV, segregancy; probability of actually being nega
tive) =
St
ar
tF
raction normal upper T normal upper N Over normal upper T normal upper N plus normal upper F normal upper N EndFraction
• Accuracy (correct classification rate) =
St
ar
tF
ra
ct
io
n
n
ormal upper T normal upper P plus normal upper T normal upper N Over normal upper T normal upper P plus normal upper F normal upper P plus normal upper T normal upper N plus normal upper F normal upper N EndFraction
• Misclassification rate =
St
ar
tF
ra
ct
io
n
n
ormal upper F normal upper P plus normal upper F normal upper N Over normal upper T normal upper P plus normal upper F normal upper P plus normal upper T normal upper N plus normal upper F normal upper N EndFraction
• Prevalence
(proportion
of
actually
positive
persons
in
the
total
number) =
St
ar
tF
ra
ct
io
n
n
ormal upper T normal upper P plus normal upper F normal upper N Over normal upper T normal upper P plus normal upper F normal upper P plus normal upper T normal upper N plus normal upper F normal upper N EndFraction
For the graphical representation, a ROC curve (Receiver Operating Characteristic; x-axis:
false positive rate, y-axis: sensitivity) is often used, where the AUC (Area Under the
Curve) is a measure of the quality of the classification (higher AUC value = better classi
fication). An ideal classification model has a 100% true positive rate (100% sensitivity)
and 0% false positive rate (100% specificity). But this is not always the case in reality. For
example, in a recent paper, we were able to show that a novel real-time PCR has better
predictive power for the detection of Trypanosoma cruzi in a Chagas disease and is supe
rior to previous PCR methods here, but is just not 100% accurate (Kann et al. 2020). In any
case, it is advisable to always create a prediction model on the basis of a training and test
data set and to validate it on at least one independent data set in order to be able to reliably
assess its predictive power for a possible application, such as a clinical decision sup
port system.
Artificial Neural Networks Another possibility for machine learning is the use of simple
neural networks, which consist of input a simple intermediate layer and an output.
Connections between these three layers are strengthened or weakened so that the output is
as accurate as possible. To do this, the neural network is trained on a training dataset (auto
matically: unsupervised; with human review: supervised) and then its accuracy is checked
on another test dataset. This can then be used to generate an optimal prediction for helix
and beta boundary regions in protein structures (PredictProtein software, https://predict
protein.org) and to determine protein localization. The deep learning approach extends
the simple neural network by several layers of intermediate neurons, which in particular
then get by with fewer neurons in the later layers (and thus bring results together, “con
verge”). This replicates – in very simplified terms – an abstraction of the many inputs to
more general terms. These networks are more complex to train (“back-propagation” and
other steps) but, often further improved with other strategies from artificial intelligence
research, also create amazing things, such as optical image recognition of leukemia cells
through improved swarm optimization (Sahlol et al. 2020) or the automatic recognition of
secondary structure and oligonucleotides in electron micrographs (Mostosi et al. 2020), so
that eventually even antibiotics can be discovered with this deep learning approach (Stokes
et al. 2020) or the energy potentials and thus also the three-dimensional structure of pro
teins (Senior et al. 2020), now culminating in large-scale and accurate deep-learning based
prediction of human proteins (Tunyasuvunakool et al. 2021).
14 We Can Think About Ourselves – The Computer Cannot